Looking at the data

We plot the data and can see that there is no obvious large difference between the versions with high and low debt

d.both_completed %>%
  ggplot(aes(x=time/60, fill=high_debt_version)) + 
  geom_boxplot() +
  labs(
    title = "Distribution of time measurements for the different debt levels",
    subtitle = "Notice! Log10 x-scale",
    x ="Time (min)"
  ) +
  scale_y_continuous(breaks = NULL) +
  scale_x_log10() +
  scale_fill_discrete(name = "Debt Level", labels = c("High Debt", "Low Debt"), guide = guide_legend(reverse = TRUE)) +
  theme_minimal()

Descriptive Statistics

d.both_completed %>%
  pull(time) %>% 
  summary()
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   442.0   937.5  1358.0  1842.2  2423.8  7298.0
sprintf("Variance: %.0f", var(pull(d.both_completed, time)))
## [1] "Variance: 1880646"

Initial model

As the variance is much greater than the mean we will use a negative binomial family that allows us to model the variance separately.

We include high_debt_verison as a predictor in our model as this variable represent the very effect we want to measure. We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

Selecting Priors

We iterate over the model until we have sane priors.

Base model with priors

time.with <- extendable_model(
  base_name = "time",
  base_formula = "time ~ 1 + high_debt_version + (1 | session)",
  base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.95)
)

Default priors

prior_summary(time.with(only_priors= TRUE))

Selected priors

prior_summary(time.with(sample_prior = "only"))

Prior Predictive Check

pp_check(time.with(sample_prior = "only"), nsamples = 200) + 
  scale_x_log10()

Model fit

We check the posterior distribution and can see that the model seems to have been able to fit the data well. Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

Posterior Predictive check

Notice: log10 scale on x-axis.

pp_check(time.with(), nsamples = 100) +
  scale_x_log10()

Summary

summary(time.with())
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + (1 | session) 
##    Data: as.data.frame(data) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.41      0.15     0.09     0.72 1.00      720      735
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  7.53      0.15     7.24     7.83 1.00     1979
## high_debt_versionfalse    -0.18      0.17    -0.50     0.15 1.00     3994
##                        Tail_ESS
## Intercept                  2611
## high_debt_versionfalse     2740
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.64      1.08     1.92     6.08 1.00     1213     2177
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Sampling plots

plot(time.with(), ask = FALSE)

Model predictor extenstions

# default prior for monotonic predictor
edlvl_prior <- prior(dirichlet(2), class = "simo", coef = "moeducation_level1")

We use loo to check some possible extensions on the model.

One variable

loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  
  # New model(s)
  time.with("work_domain"),
  time.with("work_experience_programming.s"),
  time.with("work_experience_java.s"),
  time.with("education_field"),
  time.with("mo(education_level)", edlvl_prior),
  time.with("workplace_peer_review"),
  time.with("workplace_td_tracking"),
  time.with("workplace_pair_programming"),
  time.with("workplace_coding_standards"),
  time.with("scenario"),
  time.with("group")
)

Comparison

loo_result[2]
## $diffs
##                                               elpd_diff se_diff
## time.with("scenario")                          0.0       0.0   
## time.with("education_field")                  -1.0       3.8   
## time.with("workplace_peer_review")            -1.5       2.8   
## time.with()                                   -1.5       2.2   
## time.with("mo(education_level)", edlvl_prior) -1.7       3.6   
## time.with("workplace_pair_programming")       -2.0       2.4   
## time.with("workplace_coding_standards")       -2.3       2.3   
## time.with("work_experience_java.s")           -2.4       2.2   
## time.with("work_experience_programming.s")    -2.6       2.3   
## time.with("workplace_td_tracking")            -2.8       1.9   
## time.with("group")                            -3.5       2.5   
## time.with("work_domain")                      -3.7       4.1

Diagnostics

loo_result[1]
## $loos
## $loos$`time.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.6  7.2
## p_loo        14.5  2.8
## looic       735.2 14.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   886       
##  (0.5, 0.7]   (ok)       14    31.8%   331       
##    (0.7, 1]   (bad)       4     9.1%   53        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("work_domain")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -369.7  6.4
## p_loo        16.6  2.5
## looic       739.5 12.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   656       
##  (0.5, 0.7]   (ok)       15    34.1%   135       
##    (0.7, 1]   (bad)       1     2.3%   30        
##    (1, Inf)   (very bad)  1     2.3%   33        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("work_experience_programming.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.7  7.3
## p_loo        15.6  3.1
## looic       737.3 14.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   799       
##  (0.5, 0.7]   (ok)       12    27.3%   258       
##    (0.7, 1]   (bad)       5    11.4%   25        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("work_experience_java.s")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.4  7.1
## p_loo        15.0  2.8
## looic       736.9 14.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     30    68.2%   425       
##  (0.5, 0.7]   (ok)       12    27.3%   388       
##    (0.7, 1]   (bad)       2     4.5%   45        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("education_field")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.1  6.2
## p_loo        12.9  2.0
## looic       734.1 12.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   749       
##  (0.5, 0.7]   (ok)       13    29.5%   286       
##    (0.7, 1]   (bad)       3     6.8%   106       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.8  6.9
## p_loo        13.2  2.6
## looic       735.6 13.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   685       
##  (0.5, 0.7]   (ok)       13    29.5%   672       
##    (0.7, 1]   (bad)       4     9.1%   83        
##    (1, Inf)   (very bad)  1     2.3%   19        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.6  6.4
## p_loo        12.8  2.3
## looic       735.2 12.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   557       
##  (0.5, 0.7]   (ok)       15    34.1%   326       
##    (0.7, 1]   (bad)       1     2.3%   47        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("workplace_td_tracking")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.9  8.0
## p_loo        16.1  3.8
## looic       737.8 16.0
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   895       
##  (0.5, 0.7]   (ok)       18    40.9%   405       
##    (0.7, 1]   (bad)       3     6.8%   167       
##    (1, Inf)   (very bad)  1     2.3%   5         
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("workplace_pair_programming")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.0  7.1
## p_loo        15.0  2.9
## looic       736.1 14.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   805       
##  (0.5, 0.7]   (ok)       13    29.5%   320       
##    (0.7, 1]   (bad)       3     6.8%   43        
##    (1, Inf)   (very bad)  1     2.3%   34        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("workplace_coding_standards")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.3  7.1
## p_loo        14.8  2.9
## looic       736.7 14.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     30    68.2%   747       
##  (0.5, 0.7]   (ok)        7    15.9%   600       
##    (0.7, 1]   (bad)       6    13.6%   60        
##    (1, Inf)   (very bad)  1     2.3%   14        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -366.1  8.0
## p_loo        17.3  3.7
## looic       732.1 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   786       
##  (0.5, 0.7]   (ok)       12    27.3%   328       
##    (0.7, 1]   (bad)       5    11.4%   51        
##    (1, Inf)   (very bad)  1     2.3%   14        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("group")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -369.5  6.8
## p_loo        16.1  2.8
## looic       739.0 13.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     23    52.3%   539       
##  (0.5, 0.7]   (ok)       18    40.9%   343       
##    (0.7, 1]   (bad)       3     6.8%   34        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.

Two variables

loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  time.with("scenario"),
  time.with("education_field"),
  time.with("workplace_peer_review"),
  time.with("mo(education_level)", edlvl_prior),
  
  # New model(s)
  time.with(c("scenario", "education_field")),
  time.with(c("scenario", "mo(education_level)"), edlvl_prior),
  time.with(c("scenario", "workplace_peer_review")),
  time.with(c("education_field", "mo(education_level)"), edlvl_prior),
  time.with(c("education_field", "workplace_peer_review")),
  time.with(c("mo(education_level)", "workplace_peer_review"), edlvl_prior)
)

Comparison

loo_result[2]
## $diffs
##                                                                               elpd_diff
## time.with(c("scenario", "mo(education_level)"), edlvl_prior)                   0.0     
## time.with(c("scenario", "education_field"))                                    0.0     
## time.with("scenario")                                                         -0.3     
## time.with(c("scenario", "workplace_peer_review"))                             -0.8     
## time.with("education_field")                                                  -1.3     
## time.with("workplace_peer_review")                                            -1.8     
## time.with()                                                                   -1.8     
## time.with(c("education_field", "workplace_peer_review"))                      -1.9     
## time.with("mo(education_level)", edlvl_prior)                                 -2.0     
## time.with(c("mo(education_level)", "workplace_peer_review"),     edlvl_prior) -2.5     
## time.with(c("education_field", "mo(education_level)"), edlvl_prior)           -2.5     
##                                                                               se_diff
## time.with(c("scenario", "mo(education_level)"), edlvl_prior)                   0.0   
## time.with(c("scenario", "education_field"))                                    2.1   
## time.with("scenario")                                                          2.5   
## time.with(c("scenario", "workplace_peer_review"))                              2.0   
## time.with("education_field")                                                   3.3   
## time.with("workplace_peer_review")                                             2.7   
## time.with()                                                                    3.0   
## time.with(c("education_field", "workplace_peer_review"))                       3.1   
## time.with("mo(education_level)", edlvl_prior)                                  2.4   
## time.with(c("mo(education_level)", "workplace_peer_review"),     edlvl_prior)  2.5   
## time.with(c("education_field", "mo(education_level)"), edlvl_prior)            2.9

Diagnostics

loo_result[1]
## $loos
## $loos$`time.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.6  7.2
## p_loo        14.5  2.8
## looic       735.2 14.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   886       
##  (0.5, 0.7]   (ok)       14    31.8%   331       
##    (0.7, 1]   (bad)       4     9.1%   53        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -366.1  8.0
## p_loo        17.3  3.7
## looic       732.1 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   786       
##  (0.5, 0.7]   (ok)       12    27.3%   328       
##    (0.7, 1]   (bad)       5    11.4%   51        
##    (1, Inf)   (very bad)  1     2.3%   14        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("education_field")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.1  6.2
## p_loo        12.9  2.0
## looic       734.1 12.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   749       
##  (0.5, 0.7]   (ok)       13    29.5%   286       
##    (0.7, 1]   (bad)       3     6.8%   106       
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("workplace_peer_review")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.6  6.4
## p_loo        12.8  2.3
## looic       735.2 12.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   557       
##  (0.5, 0.7]   (ok)       15    34.1%   326       
##    (0.7, 1]   (bad)       1     2.3%   47        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("mo(education_level)", edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.8  6.9
## p_loo        13.2  2.6
## looic       735.6 13.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   685       
##  (0.5, 0.7]   (ok)       13    29.5%   672       
##    (0.7, 1]   (bad)       4     9.1%   83        
##    (1, Inf)   (very bad)  1     2.3%   19        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "education_field"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -365.8  6.8
## p_loo        15.6  2.5
## looic       731.7 13.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    56.8%   435       
##  (0.5, 0.7]   (ok)       14    31.8%   425       
##    (0.7, 1]   (bad)       5    11.4%   50        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "mo(education_level)"), edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -365.8  7.3
## p_loo        15.2  2.9
## looic       731.6 14.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   391       
##  (0.5, 0.7]   (ok)       13    29.5%   129       
##    (0.7, 1]   (bad)       3     6.8%   49        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "workplace_peer_review"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -366.6  7.5
## p_loo        16.6  3.3
## looic       733.1 14.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     29    65.9%   457       
##  (0.5, 0.7]   (ok)       10    22.7%   172       
##    (0.7, 1]   (bad)       4     9.1%   177       
##    (1, Inf)   (very bad)  1     2.3%   23        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("education_field", "mo(education_level)"), edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.3  6.5
## p_loo        14.1  2.3
## looic       736.7 12.9
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     22    50.0%   628       
##  (0.5, 0.7]   (ok)       17    38.6%   629       
##    (0.7, 1]   (bad)       4     9.1%   62        
##    (1, Inf)   (very bad)  1     2.3%   36        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("education_field", "workplace_peer_review"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.7  5.8
## p_loo        12.7  1.8
## looic       735.4 11.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     27    61.4%   512       
##  (0.5, 0.7]   (ok)       12    27.3%   501       
##    (0.7, 1]   (bad)       5    11.4%   94        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("mo(education_level)", "workplace_peer_review"),     edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -368.3  6.8
## p_loo        13.5  2.6
## looic       736.5 13.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     24    54.5%   525       
##  (0.5, 0.7]   (ok)       16    36.4%   521       
##    (0.7, 1]   (bad)       3     6.8%   66        
##    (1, Inf)   (very bad)  1     2.3%   16        
## See help('pareto-k-diagnostic') for details.

Three variables

loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  time.with("scenario"),
  time.with(c("scenario", "education_field")),
  time.with(c("scenario", "mo(education_level)"), edlvl_prior),
  
  # New model(s)
  time.with(c("scenario", "mo(education_level)", "education_field"), edlvl_prior)
)

Comparison

loo_result[2]
## $diffs
##                                                                                     elpd_diff
## time.with(c("scenario", "mo(education_level)"), edlvl_prior)                         0.0     
## time.with(c("scenario", "education_field"))                                          0.0     
## time.with("scenario")                                                               -0.3     
## time.with(c("scenario", "mo(education_level)", "education_field"),     edlvl_prior) -0.7     
## time.with()                                                                         -1.8     
##                                                                                     se_diff
## time.with(c("scenario", "mo(education_level)"), edlvl_prior)                         0.0   
## time.with(c("scenario", "education_field"))                                          2.1   
## time.with("scenario")                                                                2.5   
## time.with(c("scenario", "mo(education_level)", "education_field"),     edlvl_prior)  1.4   
## time.with()                                                                          3.0

Diagnostics

loo_result[1]
## $loos
## $loos$`time.with()`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -367.6  7.2
## p_loo        14.5  2.8
## looic       735.2 14.3
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   886       
##  (0.5, 0.7]   (ok)       14    31.8%   331       
##    (0.7, 1]   (bad)       4     9.1%   53        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with("scenario")`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -366.1  8.0
## p_loo        17.3  3.7
## looic       732.1 16.1
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     26    59.1%   786       
##  (0.5, 0.7]   (ok)       12    27.3%   328       
##    (0.7, 1]   (bad)       5    11.4%   51        
##    (1, Inf)   (very bad)  1     2.3%   14        
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "education_field"))`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -365.8  6.8
## p_loo        15.6  2.5
## looic       731.7 13.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    56.8%   435       
##  (0.5, 0.7]   (ok)       14    31.8%   425       
##    (0.7, 1]   (bad)       5    11.4%   50        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "mo(education_level)"), edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -365.8  7.3
## p_loo        15.2  2.9
## looic       731.6 14.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     28    63.6%   391       
##  (0.5, 0.7]   (ok)       13    29.5%   129       
##    (0.7, 1]   (bad)       3     6.8%   49        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## $loos$`time.with(c("scenario", "mo(education_level)", "education_field"),     edlvl_prior)`
## 
## Computed from 4000 by 44 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -366.5  6.9
## p_loo        16.1  2.6
## looic       733.0 13.7
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     29    65.9%   307       
##  (0.5, 0.7]   (ok)       11    25.0%   109       
##    (0.7, 1]   (bad)       3     6.8%   115       
##    (1, Inf)   (very bad)  1     2.3%   12        
## See help('pareto-k-diagnostic') for details.

Candidate models

We pick some of our top performing models as candidates and inspect them closer.

The candidate models are named and listed in order of complexity.

Time0

We select the simplest model as a baseline.

time0 <- brm(
  "time ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(time0)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + (1 | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.42      0.15     0.09     0.71 1.00      784      991
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  7.54      0.15     7.25     7.85 1.00     2527
## high_debt_versionfalse    -0.18      0.17    -0.51     0.17 1.00     4094
##                        Tail_ESS
## Intercept                  2316
## high_debt_versionfalse     2711
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.67      1.10     1.89     6.13 1.00     1114     1722
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(time0)
## $session
## , , Intercept
## 
##                             Estimate Est.Error         Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.11275159 0.2860867 -0.685643550 0.4582609
## 6033d90a5af2c702367b3a96  0.30118528 0.2841025 -0.192779725 0.9023268
## 6034fc165af2c702367b3a98  0.31146058 0.2897695 -0.202207675 0.9059238
## 603500725af2c702367b3a99 -0.50158676 0.3736261 -1.239848500 0.1540829
## 603f97625af2c702367b3a9d -0.17573161 0.2929061 -0.763236875 0.3805710
## 603fd5d95af2c702367b3a9e  0.23260419 0.2826572 -0.282267400 0.8325673
## 60409b7b5af2c702367b3a9f  0.36297895 0.2969008 -0.149718000 0.9718112
## 604b82b5a7718fbed181b336 -0.32887367 0.3349227 -1.005135250 0.2731784
## 6050c1bf856f36729d2e5218  0.11487089 0.2748997 -0.388343300 0.6867415
## 6050e1e7856f36729d2e5219  0.07194507 0.2762681 -0.465090150 0.6481566
## 6055fdc6856f36729d2e521b -0.07517223 0.2752623 -0.622037225 0.4849628
## 60589862856f36729d2e521f  0.04470959 0.2798324 -0.476523000 0.6353896
## 605afa3a856f36729d2e5222 -0.07470695 0.2916273 -0.648323775 0.5008381
## 605c8bc6856f36729d2e5223 -0.35466018 0.3355242 -1.037378000 0.2627359
## 605f3f2d856f36729d2e5224 -0.11281797 0.2921469 -0.697781400 0.4773815
## 605f46c3856f36729d2e5225 -0.24574637 0.3096026 -0.862547675 0.3286244
## 60605337856f36729d2e5226  0.22756347 0.2824954 -0.265445925 0.8269080
## 60609ae6856f36729d2e5228  0.31306097 0.2932300 -0.203570125 0.9317685
## 6061ce91856f36729d2e522e -0.04088890 0.2757911 -0.587677450 0.5242268
## 6061f106856f36729d2e5231 -0.36697510 0.3340662 -1.055116250 0.2474501
## 6068ea9f856f36729d2e523e  0.65940365 0.3449495  0.003345279 1.3392622
## 6075ab05856f36729d2e5247 -0.30062645 0.3269950 -0.971954650 0.3085037

Sampling plots

plot(time0, ask = FALSE)

Posterior predictive check

pp_check(time0, nsamples = 150) + scale_x_log10()

Time1

We select the best performing model with one variable.

time1 <- brm(
  "time ~ 1 + high_debt_version + scenario + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time1",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(time1)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + scenario + (1 | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.46      0.14     0.17     0.73 1.00      972     1288
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  7.65      0.16     7.34     7.97 1.00     2512
## high_debt_versionfalse    -0.15      0.16    -0.46     0.19 1.00     4702
## scenariotickets           -0.30      0.16    -0.62     0.03 1.00     4391
##                        Tail_ESS
## Intercept                  2862
## high_debt_versionfalse     2577
## scenariotickets            3089
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     4.24      1.29     2.17     7.17 1.00     1212     2108
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(time1)
## $session
## , , Intercept
## 
##                              Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.131157520 0.2894703 -0.69790085 0.4472849
## 6033d90a5af2c702367b3a96  0.345975163 0.2948088 -0.17240543 0.9621481
## 6034fc165af2c702367b3a98  0.312615721 0.2890198 -0.21111305 0.9314271
## 603500725af2c702367b3a99 -0.580967048 0.3753131 -1.29102175 0.1374856
## 603f97625af2c702367b3a9d -0.178034878 0.2968035 -0.75711637 0.3855893
## 603fd5d95af2c702367b3a9e  0.216871452 0.2843529 -0.30592422 0.8045455
## 60409b7b5af2c702367b3a9f  0.434482730 0.3044064 -0.10936117 1.0527390
## 604b82b5a7718fbed181b336 -0.384507154 0.3158815 -0.99940722 0.2023649
## 6050c1bf856f36729d2e5218  0.178131596 0.2899054 -0.37398580 0.7811601
## 6050e1e7856f36729d2e5219  0.084611385 0.2835651 -0.44492085 0.6639378
## 6055fdc6856f36729d2e521b -0.067260531 0.2879363 -0.62157198 0.5123126
## 60589862856f36729d2e521f  0.007323255 0.2802704 -0.53228525 0.5927170
## 605afa3a856f36729d2e5222 -0.090608450 0.2803041 -0.63434492 0.4693394
## 605c8bc6856f36729d2e5223 -0.424604250 0.3286785 -1.06578100 0.2004248
## 605f3f2d856f36729d2e5224 -0.084178630 0.2924085 -0.63096325 0.5377237
## 605f46c3856f36729d2e5225 -0.274337832 0.2973105 -0.84274730 0.2896524
## 60605337856f36729d2e5226  0.263460224 0.2959967 -0.27361325 0.8840842
## 60609ae6856f36729d2e5228  0.362073268 0.2928321 -0.17288075 0.9783804
## 6061ce91856f36729d2e522e -0.054703919 0.2956666 -0.63568705 0.5367869
## 6061f106856f36729d2e5231 -0.390416491 0.3256629 -1.03886450 0.2198758
## 6068ea9f856f36729d2e523e  0.804105705 0.3408505  0.07752689 1.4607338
## 6075ab05856f36729d2e5247 -0.357241946 0.3189408 -0.98133657 0.2437772

Sampling plots

plot(time1, ask = FALSE)

Posterior predictive check

pp_check(time1, nsamples = 150) + scale_x_log10()

Time2

We select the best performing model with two variables.

time2 <- brm(
  "time ~ 1 + high_debt_version + scenario + mo(education_level) + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape"),
    prior(dirichlet(2), class = "simo", coef = "moeducation_level1")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time2",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(time2)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + scenario + mo(education_level) + (1 | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.32      0.17     0.01     0.63 1.00      673      975
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  8.25      0.37     7.60     9.03 1.01     1415
## high_debt_versionfalse    -0.15      0.16    -0.47     0.17 1.00     4332
## scenariotickets           -0.32      0.17    -0.64     0.01 1.00     5152
## moeducation_level         -0.24      0.12    -0.47    -0.01 1.01     1299
##                        Tail_ESS
## Intercept                  1776
## high_debt_versionfalse     2746
## scenariotickets            2521
## moeducation_level          1913
## 
## Simplex Parameters: 
##                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## moeducation_level1[1]     0.34      0.16     0.06     0.66 1.00     2900
## moeducation_level1[2]     0.16      0.11     0.02     0.45 1.00     3883
## moeducation_level1[3]     0.17      0.12     0.02     0.46 1.00     4003
## moeducation_level1[4]     0.32      0.15     0.07     0.63 1.00     4069
##                       Tail_ESS
## moeducation_level1[1]     2878
## moeducation_level1[2]     2719
## moeducation_level1[3]     2963
## moeducation_level1[4]     3137
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.97      1.24     2.11     6.91 1.00      947     2244
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(time2)
## $session
## , , Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.06489116 0.2373079 -0.56041770 0.4161218
## 6033d90a5af2c702367b3a96  0.14098737 0.2514511 -0.29310287 0.7132378
## 6034fc165af2c702367b3a98  0.24017793 0.2681683 -0.17918125 0.8398987
## 603500725af2c702367b3a99 -0.45162733 0.4026797 -1.31337375 0.1285293
## 603f97625af2c702367b3a9d -0.10064558 0.2423361 -0.62516282 0.3661429
## 603fd5d95af2c702367b3a9e  0.05787041 0.2444601 -0.42858840 0.5860447
## 60409b7b5af2c702367b3a9f  0.34139368 0.2947144 -0.08852588 0.9886314
## 604b82b5a7718fbed181b336 -0.13188201 0.2644422 -0.71980265 0.3506500
## 6050c1bf856f36729d2e5218  0.15362917 0.2503672 -0.27582955 0.7191141
## 6050e1e7856f36729d2e5219 -0.03175075 0.2450888 -0.54740960 0.4872661
## 6055fdc6856f36729d2e521b -0.08358514 0.2496640 -0.60834568 0.4079105
## 60589862856f36729d2e521f  0.14534943 0.2611730 -0.32717898 0.7414337
## 605afa3a856f36729d2e5222 -0.04418395 0.2387607 -0.53486885 0.4489215
## 605c8bc6856f36729d2e5223 -0.16713611 0.2752103 -0.79219780 0.3118615
## 605f3f2d856f36729d2e5224 -0.03562485 0.2365540 -0.51963285 0.4530460
## 605f46c3856f36729d2e5225 -0.06168182 0.2582907 -0.61540618 0.4564646
## 60605337856f36729d2e5226  0.21400610 0.2712890 -0.21215202 0.8405765
## 60609ae6856f36729d2e5228  0.21802292 0.2638524 -0.19673462 0.7920115
## 6061ce91856f36729d2e522e -0.12484399 0.2626830 -0.69784477 0.3467587
## 6061f106856f36729d2e5231 -0.33444313 0.3365499 -1.05775775 0.1775895
## 6068ea9f856f36729d2e523e  0.31627952 0.3397728 -0.20239922 1.0827567
## 6075ab05856f36729d2e5247 -0.11935109 0.2655445 -0.71282735 0.3687091

Sampling plots

plot(time2, ask = FALSE)

Posterior predictive check

pp_check(time2, nsamples = 150) + scale_x_log10()

Time3

We select the second best performing model with two variables.

time3 <- brm(
  "time ~ 1 + high_debt_version + scenario + education_field + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time3",
  file_refit = "on_change",
  seed = 20210421
)

Summary

summary(time3)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + scenario + education_field + (1 | session) 
##    Data: as.data.frame(d.both_completed) (Number of observations: 44) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 22) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.35      0.16     0.03     0.65 1.00      734      848
## 
## Population-Level Effects: 
##                                    Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                              7.65      0.23     7.21     8.11 1.00
## high_debt_versionfalse                -0.17      0.16    -0.48     0.15 1.00
## scenariotickets                       -0.33      0.16    -0.65    -0.01 1.00
## education_fieldInteractionDesign      -0.46      0.52    -1.47     0.57 1.00
## education_fieldNone                    1.06      0.49     0.13     2.05 1.00
## education_fieldSoftwareEngineering     0.00      0.25    -0.50     0.47 1.00
##                                    Bulk_ESS Tail_ESS
## Intercept                              2651     2411
## high_debt_versionfalse                 5421     2421
## scenariotickets                        4774     2579
## education_fieldInteractionDesign       2879     2518
## education_fieldNone                    2997     2649
## education_fieldSoftwareEngineering     2214     1912
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     4.14      1.27     2.19     6.95 1.00     1164     2395
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Random effects

ranef(time3)
## $session
## , , Intercept
## 
##                             Estimate Est.Error       Q2.5     Q97.5
## 6033d69a5af2c702367b3a95 -0.08293346 0.2762997 -0.6506655 0.4595537
## 6033d90a5af2c702367b3a96  0.28430349 0.2753035 -0.1679328 0.8754226
## 6034fc165af2c702367b3a98  0.25582499 0.2663495 -0.1811189 0.8351591
## 603500725af2c702367b3a99 -0.43005665 0.3568812 -1.1641978 0.1529283
## 603f97625af2c702367b3a9d -0.13018561 0.2717640 -0.7127580 0.3790718
## 603fd5d95af2c702367b3a9e  0.18674536 0.2593710 -0.2540311 0.7503900
## 60409b7b5af2c702367b3a9f  0.36812731 0.2939810 -0.0971086 0.9932092
## 604b82b5a7718fbed181b336 -0.28082771 0.2978583 -0.9004351 0.2459909
## 6050c1bf856f36729d2e5218  0.15968857 0.2621604 -0.3126920 0.7298558
## 6050e1e7856f36729d2e5219  0.08134850 0.2504322 -0.3900459 0.6282933
## 6055fdc6856f36729d2e521b -0.04555245 0.2633609 -0.5669126 0.4875088
## 60589862856f36729d2e521f  0.01574282 0.2465412 -0.4567513 0.5440192
## 605afa3a856f36729d2e5222 -0.06195952 0.2684858 -0.6160360 0.4804998
## 605c8bc6856f36729d2e5223 -0.30838067 0.3069765 -0.9366980 0.2177001
## 605f3f2d856f36729d2e5224 -0.05390679 0.2672687 -0.5984670 0.4632304
## 605f46c3856f36729d2e5225 -0.20080961 0.2769727 -0.8019196 0.3072222
## 60605337856f36729d2e5226  0.22553691 0.2748839 -0.2484014 0.8242951
## 60609ae6856f36729d2e5228  0.30829846 0.2922382 -0.1723765 0.9343437
## 6061ce91856f36729d2e522e -0.02434554 0.2449203 -0.5018119 0.4834777
## 6061f106856f36729d2e5231 -0.28477542 0.3054150 -0.9283804 0.2438914
## 6068ea9f856f36729d2e523e  0.12585163 0.3566877 -0.5549790 0.9069138
## 6075ab05856f36729d2e5247 -0.07493014 0.3733400 -0.8785455 0.6827192

Sampling plots

plot(time3, ask = FALSE)

Posterior predictive check

pp_check(time3, nsamples = 150) + scale_x_log10()

Final Model

All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: time0

Variations

We will try a few different variations of the selected candidate model.

All data points

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

time0.all <- brm(
  "time ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0.all",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(time0.all)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + (1 | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.43      0.14     0.13     0.69 1.00      926      958
## 
## Population-Level Effects: 
##                        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                  7.53      0.14     7.26     7.82 1.00     2362
## high_debt_versionfalse    -0.20      0.16    -0.49     0.11 1.00     4301
##                        Tail_ESS
## Intercept                  2409
## high_debt_versionfalse     2972
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.76      1.05     2.08     6.08 1.00     1169     1896
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(time0.all)
## $session
## , , Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  0.10743406 0.3215793 -0.49176255 0.7797215
## 6033d69a5af2c702367b3a95 -0.11393560 0.2878708 -0.68202983 0.4603690
## 6033d90a5af2c702367b3a96  0.32426832 0.2884523 -0.18827020 0.9204864
## 6034fc165af2c702367b3a98  0.32566422 0.2880610 -0.19423385 0.9328176
## 603500725af2c702367b3a99 -0.51453087 0.3640134 -1.24499075 0.1281945
## 603f84f15af2c702367b3a9b -0.36670177 0.3991751 -1.18778925 0.3558798
## 603f97625af2c702367b3a9d -0.18153951 0.2919422 -0.74118115 0.4041113
## 603fd5d95af2c702367b3a9e  0.25168127 0.2876644 -0.27646243 0.8557478
## 60409b7b5af2c702367b3a9f  0.38228571 0.3033029 -0.16228467 1.0108630
## 604b82b5a7718fbed181b336 -0.33134416 0.3220736 -0.98139030 0.2840337
## 604f1239a7718fbed181b33f -0.02585933 0.3359783 -0.66638825 0.6747302
## 6050c1bf856f36729d2e5218  0.12702070 0.2844274 -0.39857212 0.7282526
## 6050e1e7856f36729d2e5219  0.08732978 0.2788988 -0.42141037 0.6620035
## 6055fdc6856f36729d2e521b -0.06830187 0.2847831 -0.63595455 0.5284143
## 60579f2a856f36729d2e521e -0.05001702 0.3367932 -0.70177923 0.6469537
## 60589862856f36729d2e521f  0.05826002 0.2851661 -0.47828292 0.6554365
## 605a30a7856f36729d2e5221 -0.20497733 0.3589553 -0.95256537 0.4872980
## 605afa3a856f36729d2e5222 -0.06369539 0.2935347 -0.64654803 0.5295743
## 605c8bc6856f36729d2e5223 -0.36294381 0.3313016 -1.02561925 0.2553571
## 605f3f2d856f36729d2e5224 -0.11411407 0.2794536 -0.64467565 0.4437461
## 605f46c3856f36729d2e5225 -0.25334110 0.3045431 -0.85241390 0.3196846
## 60605337856f36729d2e5226  0.24639023 0.2910492 -0.26199735 0.8719473
## 60609ae6856f36729d2e5228  0.33274912 0.2890039 -0.18430892 0.9341700
## 6061ce91856f36729d2e522e -0.03236292 0.2815524 -0.57206517 0.5407210
## 6061f106856f36729d2e5231 -0.37393379 0.3343367 -1.00906150 0.2773342
## 60672faa856f36729d2e523c -0.12075577 0.3383869 -0.79444963 0.5510030
## 6068ea9f856f36729d2e523e  0.69985604 0.3314307  0.03456946 1.3554313
## 606db69d856f36729d2e5243  0.57103782 0.3699743 -0.05728208 1.3535968
## 6075ab05856f36729d2e5247 -0.30871951 0.3183636 -0.92231982 0.2797401
Sampling plots
plot(time0.all, ask = FALSE)

Posterior predictive check
pp_check(time0.all, nsamples = 150) + scale_x_log10()

With experience predictor

As including all data points didn’t harm the model we will create this variant with all data points as well.

time0.all.exp <- brm(
  "time ~ 1 + high_debt_version + work_experience_programming.s + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)
Summary
summary(time0.all.exp)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + work_experience_programming.s + (1 | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.45      0.14     0.14     0.71 1.01      708      654
## 
## Population-Level Effects: 
##                               Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                         7.54      0.14     7.26     7.82 1.00
## high_debt_versionfalse           -0.20      0.16    -0.50     0.10 1.00
## work_experience_programming.s     0.04      0.12    -0.20     0.28 1.00
##                               Bulk_ESS Tail_ESS
## Intercept                         1883     2616
## high_debt_versionfalse            4496     2962
## work_experience_programming.s     1589     2248
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape     3.75      1.05     2.04     6.02 1.00     1059     1646
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Random effects
ranef(time0.all.exp)
## $session
## , , Intercept
## 
##                             Estimate Est.Error        Q2.5     Q97.5
## 6033c6fc5af2c702367b3a93  0.11436634 0.3386521 -0.52261055 0.8266039
## 6033d69a5af2c702367b3a95 -0.10323867 0.2966430 -0.68971485 0.4861681
## 6033d90a5af2c702367b3a96  0.33873191 0.2935511 -0.18501772 0.9456097
## 6034fc165af2c702367b3a98  0.34495015 0.3037295 -0.19167927 0.9722129
## 603500725af2c702367b3a99 -0.51896554 0.3736859 -1.26467925 0.1508850
## 603f84f15af2c702367b3a9b -0.38313522 0.4158486 -1.23419975 0.3778444
## 603f97625af2c702367b3a9d -0.17090340 0.3091673 -0.76201672 0.4497196
## 603fd5d95af2c702367b3a9e  0.27638045 0.2904963 -0.26957063 0.8767928
## 60409b7b5af2c702367b3a9f  0.40718387 0.3049947 -0.13132455 1.0372023
## 604b82b5a7718fbed181b336 -0.33476385 0.3332111 -0.98945650 0.2977670
## 604f1239a7718fbed181b33f -0.03143005 0.3278864 -0.67265365 0.6432929
## 6050c1bf856f36729d2e5218  0.12539732 0.2860831 -0.42352197 0.7080399
## 6050e1e7856f36729d2e5219  0.08576607 0.2765233 -0.43495113 0.6692808
## 6055fdc6856f36729d2e521b -0.05793969 0.2966705 -0.64860180 0.5421729
## 60579f2a856f36729d2e521e -0.05592628 0.3547309 -0.75353255 0.6811279
## 60589862856f36729d2e521f  0.01215595 0.3402376 -0.65264142 0.7036291
## 605a30a7856f36729d2e5221 -0.20484817 0.3679164 -0.96836420 0.5002716
## 605afa3a856f36729d2e5222 -0.10318751 0.3136994 -0.72476087 0.5194962
## 605c8bc6856f36729d2e5223 -0.38926449 0.3301529 -1.05361050 0.2138376
## 605f3f2d856f36729d2e5224 -0.17747834 0.3823643 -0.98431508 0.5444238
## 605f46c3856f36729d2e5225 -0.24906402 0.3145777 -0.89621808 0.3507401
## 60605337856f36729d2e5226  0.26867787 0.2941246 -0.24631512 0.8860694
## 60609ae6856f36729d2e5228  0.36009662 0.2998216 -0.17088095 0.9726449
## 6061ce91856f36729d2e522e -0.02747183 0.2857916 -0.57670435 0.5517471
## 6061f106856f36729d2e5231 -0.37962825 0.3406161 -1.04951750 0.2722603
## 60672faa856f36729d2e523c -0.11292912 0.3609199 -0.82503188 0.6248903
## 6068ea9f856f36729d2e523e  0.71307170 0.3365822  0.03128014 1.3653205
## 606db69d856f36729d2e5243  0.55498432 0.3645415 -0.07069035 1.3209760
## 6075ab05856f36729d2e5247 -0.30613297 0.3229255 -0.95367582 0.2899334
Loo comparison
loo(
  time0.all,
  time0.all.exp
)
## Output of model 'time0.all':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -425.1  7.8
## p_loo        17.5  3.1
## looic       850.2 15.6
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     25    49.0%   758       
##  (0.5, 0.7]   (ok)       21    41.2%   258       
##    (0.7, 1]   (bad)       5     9.8%   70        
##    (1, Inf)   (very bad)  0     0.0%   <NA>      
## See help('pareto-k-diagnostic') for details.
## 
## Output of model 'time0.all.exp':
## 
## Computed from 4000 by 51 log-likelihood matrix
## 
##          Estimate   SE
## elpd_loo   -425.7  7.8
## p_loo        18.2  3.2
## looic       851.4 15.5
## ------
## Monte Carlo SE of elpd_loo is NA.
## 
## Pareto k diagnostic values:
##                          Count Pct.    Min. n_eff
## (-Inf, 0.5]   (good)     30    58.8%   756       
##  (0.5, 0.7]   (ok)       16    31.4%   362       
##    (0.7, 1]   (bad)       4     7.8%   42        
##    (1, Inf)   (very bad)  1     2.0%   12        
## See help('pareto-k-diagnostic') for details.
## 
## Model comparisons:
##               elpd_diff se_diff
## time0.all      0.0       0.0   
## time0.all.exp -0.6       0.5
Sampling plots
plot(time0.all.exp, ask = FALSE)

Posterior predictive check
pp_check(time0.all.exp, nsamples = 150) + scale_x_log10()

Final model

  • Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
  • Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is time0.all.exp

Interpreting the model

summary(time0.all.exp, prob = 0.89)
##  Family: negbinomial 
##   Links: mu = log; shape = identity 
## Formula: time ~ 1 + high_debt_version + work_experience_programming.s + (1 | session) 
##    Data: as.data.frame(d.completed) (Number of observations: 51) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~session (Number of levels: 29) 
##               Estimate Est.Error l-89% CI u-89% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.45      0.14     0.22     0.66 1.01      708      654
## 
## Population-Level Effects: 
##                               Estimate Est.Error l-89% CI u-89% CI Rhat
## Intercept                         7.54      0.14     7.31     7.77 1.00
## high_debt_versionfalse           -0.20      0.16    -0.45     0.05 1.00
## work_experience_programming.s     0.04      0.12    -0.15     0.23 1.00
##                               Bulk_ESS Tail_ESS
## Intercept                         1883     2616
## high_debt_versionfalse            4496     2962
## work_experience_programming.s     1589     2248
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-89% CI u-89% CI Rhat Bulk_ESS Tail_ESS
## shape     3.75      1.05     2.28     5.54 1.00     1059     1646
## 
## Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Effects sizes

We start by extracting posterior samples

post <- posterior_predict(time0.all.exp, newdata = data.frame(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = (10 - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
))
post.low <-  post[,1]
post.high <- post[,2]
summary(data.frame(
  "Low Debt"=post.low,
  "High Debt"=post.high
))
##     Low.Debt         High.Debt    
##  Min.   :   31.0   Min.   :   57  
##  1st Qu.:  832.8   1st Qu.: 1022  
##  Median : 1396.5   Median : 1699  
##  Mean   : 1728.9   Mean   : 2103  
##  3rd Qu.: 2242.2   3rd Qu.: 2743  
##  Max.   :14944.0   Max.   :14778
post.diff <- post.high - post.low
post.diff.scaled <- post.diff / mean(post)

When we look at the distribution of the data between the high and low debt version we can see a difference, they are however quite similar.

ggplot(data.frame(time = post.high)) +
  geom_density(data = data.frame(time = post.low), aes(x=time/60, colour = "blue")) +
  geom_density(data = data.frame(time = post.high),  aes(x=time/60, colour = "red", linetype="dashed" )) +
  scale_x_log10() +
  labs(
    title = "Posterior density of time for high and low debt versions",
    subtitle = "Notice! x-axis is log10 scaled.",
    x ="Time (min)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL) +
  scale_colour_manual(name="Version",labels=c("Low Debt", "High Debt"), values=c("blue", "red"))

We can also plot the difference between time for the high debt version and the low debt version. In those graphs we see that the difference is centered around zero with slightly more weight on the positive (indicating slightly longer times for the high debt version)

ggplot(data.frame(x = post.diff/60)) + geom_density(aes(x=x)) +
  labs(
    title = "Posterior density of time difference low and high debt versions ",
    subtitle = "Difference = High Debt time - Low Debt time",
    x ="Time Difference (min)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL)

data.frame(x = post.diff.scaled) %>%
  ggplot() +
  geom_density(aes(x=x)) +
  labs(
    title = "Posterior density of time difference low and high debt versions ",
    subtitle = "Difference = High Debt time - Low Debt time",
    x ="Time Difference (prortion of mean time)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL)

We can also check the probability of the high debt effect being less than as well as grather then zero.

sprintf("high debt effect < 0: %.2f%%", sum(sign(post.diff) == -1)/length(post.diff) * 100)
## [1] "high debt effect < 0: 40.35%"
sprintf("high debt effect > 0: %.2f%%", sum(sign(post.diff) == 1)/length(post.diff) * 100)
## [1] "high debt effect > 0: 59.62%"

We can see no “significant” relation between the amount of technical debt in the given codebase and time spent on the task.

---
title: "Time Outcome"
author: Hampus Broman & William Levén
date: 2021-04
output: 
  html_document: 
    pandoc_args: [ "-o", "html/time.html" ]
---

```{r include-setup, include=FALSE}
# Load setup file
source(knitr::purl('setup.Rmd', output = tempfile()))
```


## Looking at the data

We plot the data and can see that there is no obvious large difference between the versions with high and low debt

```{r plot}
d.both_completed %>%
  ggplot(aes(x=time/60, fill=high_debt_version)) + 
  geom_boxplot() +
  labs(
    title = "Distribution of time measurements for the different debt levels",
    subtitle = "Notice! Log10 x-scale",
    x ="Time (min)"
  ) +
  scale_y_continuous(breaks = NULL) +
  scale_x_log10() +
  scale_fill_discrete(name = "Debt Level", labels = c("High Debt", "Low Debt"), guide = guide_legend(reverse = TRUE)) +
  theme_minimal()
```

## Descriptive Statistics

```{r descriptive-statistics}
d.both_completed %>%
  pull(time) %>% 
  summary()

sprintf("Variance: %.0f", var(pull(d.both_completed, time)))
```

## Initial model
As the variance is much greater than the mean we will use a negative binomial family that allows us to model the variance separately.

We include `high_debt_verison` as a predictor in our model as this variable represent the very effect we want to measure.
We also include a varying intercept for each individual to prevent the model from learning too much from single participants with extreme measurements.

### Selecting Priors {.tabset}

We iterate over the model until we have sane priors.

#### Base model with priors

```{r initial-model-definition, class.source = 'fold-show'}
time.with <- extendable_model(
  base_name = "time",
  base_formula = "time ~ 1 + high_debt_version + (1 | session)",
  base_priors = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = d.both_completed,
  base_control = list(adapt_delta = 0.95)
)
```

#### Default priors

```{r default-priors}
prior_summary(time.with(only_priors= TRUE))
```

#### Selected priors

```{r selected-priors}
prior_summary(time.with(sample_prior = "only"))
```

#### Prior Predictive Check

```{r priors-check, warning=FALSE}
pp_check(time.with(sample_prior = "only"), nsamples = 200) + 
  scale_x_log10()
```

### Model fit {.tabset}

We check the posterior distribution and can see that the model seems to have been able to fit the data well. 
Sampling seems to also have worked well as Rhat values are close to 1 and the sampling plots look nice.

#### Posterior Predictive check

Notice: log10 scale on x-axis.

```{r base-pp-check}
pp_check(time.with(), nsamples = 100) +
  scale_x_log10()
```

#### Summary

```{r base-summary}
summary(time.with())
```

#### Sampling plots

```{r base-plot}
plot(time.with(), ask = FALSE)
```

## Model predictor extenstions {.tabset}

```{r mo-priors}
# default prior for monotonic predictor
edlvl_prior <- prior(dirichlet(2), class = "simo", coef = "moeducation_level1")
```


We use `loo` to check some possible extensions on the model.

### One variable {.tabset}

```{r model-extension-1, warning=FALSE, class.source = 'fold-show'}
loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  
  # New model(s)
  time.with("work_domain"),
  time.with("work_experience_programming.s"),
  time.with("work_experience_java.s"),
  time.with("education_field"),
  time.with("mo(education_level)", edlvl_prior),
  time.with("workplace_peer_review"),
  time.with("workplace_td_tracking"),
  time.with("workplace_pair_programming"),
  time.with("workplace_coding_standards"),
  time.with("scenario"),
  time.with("group")
)
```

#### Comparison

```{r model-extension-1-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-1-dig, warning=FALSE}
loo_result[1]
```

### Two variables {.tabset}

```{r model-extension-2, warning=FALSE, class.source = 'fold-show'}
loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  time.with("scenario"),
  time.with("education_field"),
  time.with("workplace_peer_review"),
  time.with("mo(education_level)", edlvl_prior),
  
  # New model(s)
  time.with(c("scenario", "education_field")),
  time.with(c("scenario", "mo(education_level)"), edlvl_prior),
  time.with(c("scenario", "workplace_peer_review")),
  time.with(c("education_field", "mo(education_level)"), edlvl_prior),
  time.with(c("education_field", "workplace_peer_review")),
  time.with(c("mo(education_level)", "workplace_peer_review"), edlvl_prior)
)
```

#### Comparison

```{r model-extension-2-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-2-dig, warning=FALSE}
loo_result[1]
```


### Three variables {.tabset}

```{r model-extension-3, warning=FALSE, class.source = 'fold-show'}
loo_result <- loo(
  # Benchmark model(s)
  time.with(),
  time.with("scenario"),
  time.with(c("scenario", "education_field")),
  time.with(c("scenario", "mo(education_level)"), edlvl_prior),
  
  # New model(s)
  time.with(c("scenario", "mo(education_level)", "education_field"), edlvl_prior)
)
```

#### Comparison

```{r model-extension-3-sum, warning=FALSE}
loo_result[2]
```

#### Diagnostics

```{r model-extension-3-dig, warning=FALSE}
loo_result[1]
```


## Candidate models  {.tabset}
We pick some of our top performing models as candidates and inspect them closer.

The candidate models are named and listed in order of complexity.

### Time0  {.tabset}

We select the simplest model as a baseline.

```{r time0, class.source = 'fold-show'}
time0 <- brm(
  "time ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r time0-sum}
summary(time0)
```

#### Random effects

```{r time0-raneff}
ranef(time0)
```

#### Sampling plots

```{r time0-plot}
plot(time0, ask = FALSE)
```

#### Posterior predictive check

```{r time0-pp}
pp_check(time0, nsamples = 150) + scale_x_log10()
```

### Time1  {.tabset}

We select the best performing model with one variable.

```{r time1, class.source = 'fold-show'}
time1 <- brm(
  "time ~ 1 + high_debt_version + scenario + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time1",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r time1-sum}
summary(time1)
```

#### Random effects

```{r time1-raneff}
ranef(time1)
```

#### Sampling plots

```{r time1-plot}
plot(time1, ask = FALSE)
```

#### Posterior predictive check

```{r time1-pp}
pp_check(time1, nsamples = 150) + scale_x_log10()
```

### Time2  {.tabset}

We select the best performing model with two variables.

```{r time2, class.source = 'fold-show'}
time2 <- brm(
  "time ~ 1 + high_debt_version + scenario + mo(education_level) + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape"),
    prior(dirichlet(2), class = "simo", coef = "moeducation_level1")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time2",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r time2-sum}
summary(time2)
```

#### Random effects

```{r time2-raneff}
ranef(time2)
```

#### Sampling plots

```{r time2-plot}
plot(time2, ask = FALSE)
```

#### Posterior predictive check

```{r time2-pp}
pp_check(time2, nsamples = 150) + scale_x_log10()
```

### Time3  {.tabset}

We select the second best performing model with two variables.

```{r time3, class.source = 'fold-show'}
time3 <- brm(
  "time ~ 1 + high_debt_version + scenario + education_field + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.both_completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time3",
  file_refit = "on_change",
  seed = 20210421
)
```

#### Summary

```{r time3-sum}
summary(time3)
```

#### Random effects

```{r time3-raneff}
ranef(time3)
```

#### Sampling plots

```{r time3-plot}
plot(time3, ask = FALSE)
```

#### Posterior predictive check

```{r time3-pp}
pp_check(time3, nsamples = 150) + scale_x_log10()
```

## Final Model 
All candidate models look nice, none is significantly better than the others, we will proceed the simplest model: `time0`

### Variations {.tabset}
We will try a few different variations of the selected candidate model.

#### All data points {.tabset}

Some participants did only complete one scenario. Those has been excluded from the initial dataset to improve sampling of the models. We do however want to use all data we can and will therefore try to fit the model with the complete dataset.

```{r variation.all, class.source = 'fold-show'}
time0.all <- brm(
  "time ~ 1 + high_debt_version + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0.all",
  file_refit = "on_change",
  seed = 20210421
)
```

##### Summary

```{r variation.all-sum}
summary(time0.all)
```

##### Random effects

```{r variation.all-raneff}
ranef(time0.all)
```

##### Sampling plots

```{r variation.all-plot}
plot(time0.all, ask = FALSE)
```

##### Posterior predictive check

```{r variation.all-pp}
pp_check(time0.all, nsamples = 150) + scale_x_log10()
```

#### With experience predictor {.tabset}

As including all data points didn't harm the model we will create this variant with all data points as well.

```{r variation.all.exp, class.source = 'fold-show'}
time0.all.exp <- brm(
  "time ~ 1 + high_debt_version + work_experience_programming.s + (1 | session)",
  prior = c(
    prior(normal(0, 1), class = "b"),
    prior(normal(7.5, 1), class = "Intercept"),
    prior(exponential(1), class = "sd"),
    prior(gamma(0.01, 0.01), class = "shape")
  ),
  family = negbinomial(),
  data = as.data.frame(d.completed),
  control = list(adapt_delta = 0.95),
  file = "fits/time0.all.exp",
  file_refit = "on_change",
  seed = 20210421
)
```

##### Summary

```{r variation.all.exp-sum}
summary(time0.all.exp)
```

##### Random effects

```{r variation.all.exp-raneff}
ranef(time0.all.exp)
```

##### Loo comparison

```{r variation.all.exp-loo, warning=FALSE}
loo(
  time0.all,
  time0.all.exp
)
```

##### Sampling plots

```{r variation.all.exp-plot}
plot(time0.all.exp, ask = FALSE)
```

##### Posterior predictive check

```{r variation.all.exp-pp}
pp_check(time0.all.exp, nsamples = 150) + scale_x_log10()
```

### Final model
* Fitting the model to all data point did not significantly damage the model and will be used as is a more fair representation of reality.
* Adding the experience predictors did not significantly damage the model and will be used as it provides useful insight.

This means that our final model, with all data points and experience predictors, is `time0.all.exp`

## Interpreting the model

```{r}
summary(time0.all.exp, prob = 0.89)
```

### Effects sizes
We start by extracting posterior samples
```{r}
post <- posterior_predict(time0.all.exp, newdata = data.frame(
  high_debt_version = c("false", "true"),
  session = NA,
  work_experience_programming.s = (10 - mean(d.completed$work_experience_programming))/ sd(d.completed$work_experience_programming)
))
post.low <-  post[,1]
post.high <- post[,2]
summary(data.frame(
  "Low Debt"=post.low,
  "High Debt"=post.high
))
post.diff <- post.high - post.low
post.diff.scaled <- post.diff / mean(post)
```


When we look at the distribution of the data between the high and low debt version we can see a difference, they are however quite similar.
```{r}
ggplot(data.frame(time = post.high)) +
  geom_density(data = data.frame(time = post.low), aes(x=time/60, colour = "blue")) +
  geom_density(data = data.frame(time = post.high),  aes(x=time/60, colour = "red", linetype="dashed" )) +
  scale_x_log10() +
  labs(
    title = "Posterior density of time for high and low debt versions",
    subtitle = "Notice! x-axis is log10 scaled.",
    x ="Time (min)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL) +
  scale_colour_manual(name="Version",labels=c("Low Debt", "High Debt"), values=c("blue", "red"))

```

We can also plot the difference between time for the high debt version and the low debt version. In those graphs we see that the difference is centered around zero with slightly more weight on the positive (indicating slightly longer times for the high debt version)
```{r}

ggplot(data.frame(x = post.diff/60)) + geom_density(aes(x=x)) +
  labs(
    title = "Posterior density of time difference low and high debt versions ",
    subtitle = "Difference = High Debt time - Low Debt time",
    x ="Time Difference (min)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL)


data.frame(x = post.diff.scaled) %>%
  ggplot() +
  geom_density(aes(x=x)) +
  labs(
    title = "Posterior density of time difference low and high debt versions ",
    subtitle = "Difference = High Debt time - Low Debt time",
    x ="Time Difference (prortion of mean time)",
    y = "Density"
  ) +
  theme_minimal() +
  scale_y_continuous(breaks = NULL)

```


We can also check the probability of the high debt effect being less than as well as grather then zero.
```{r}
sprintf("high debt effect < 0: %.2f%%", sum(sign(post.diff) == -1)/length(post.diff) * 100)
sprintf("high debt effect > 0: %.2f%%", sum(sign(post.diff) == 1)/length(post.diff) * 100)
```

We can see no "significant" relation between the amount of technical debt in the given codebase and time spent on the task.

